<module 'src.queries' from '/Users/paulius/Docs/Projects/Turing/m2s2/src/queries.py'>
Background¶
Spotify invested over $1 billion into podcasts between 2018 and 2022. The strategy focused on acquiring and producing exclusive podcast content to boost its market presence.
- "The Joe Rogan Experience": ~$100 million deal, with 11 million listeners per episode.
- "Archetypes" by the Sussexes: $20 million contract, discontinued after a single season.
- Partnership with the Obamas: estimated $30million contract
Despite these significant expenditures, financial returns have lagged, with Spotify reporting a net loss of €430 million in 2022 and ongoing deficits. As a result, the company has reevaluated its approach, streamlining operations and canceling underperforming shows.
Spotify now focuses on sustainable growth through content diversification and bolstering podcast advertising. The success of these strategic adjustments in the competitive and fluctuating digital ad space remains to be seen.
Introduction¶
The purpose of this analysis is to examine whether Spotify made the correct choice of focusing on a small number of very expensive podcast deals or whether some other approaches might have made more sense longterm.
Specifically we'll look into:
- How successful was Spotify in growing it's podcast listener base?
- Did major podcasts outpace others in attracting listeners, justifying Spotify's past investment focus?
We'll be using Podcast Reviews dataset from Kaggle.
Popularity and average top 10% percentile count over time¶
We can see that the share of reviews for the top 5% and especially top 1% of podcasts started increasing significantly after 2018. This implies that their strategy was successful both in attracting new listeners and those listeners disproportionally listened to small number of most popular podcasts compare to the years before.
We see a significant falloff after 2021, this probably has several explanations:
- media consumption in general decreased in the aftermath of the Covid pandemic.
- we're using review count as a proxy for popularity, which is problematic because users can only leavy a single review for a podcast so we can't track whether they continued listening to those podcasts.
Some questions we need to consider, if Spotify's investment in podcasts (both specific and overall infrastructure) resulted in significant user growth, did most of these users:
- Disproportionately listen to these most expensive/most popular podcasts?
- If so, did these new users later engage with other podcasts as well?
- Were the new users retained over a longer period, or was there a significant drop-off?
Did most of the growth went to the top 1% of podcasts¶
Hypothesis I¶
Did Spotify's investment and overall strategy of focusing on a small number of creators prove effective? Specifically, did the growth rate in popularity of the most popular podcasts (defined as the top 1st percentile based on the number of reviews) exceed that of other podcasts? Based on this question, we formulate our first hypothesis:
H1: The number of reviews for the most popular podcasts is increasing at a faster rate than for the bottom 99% of all podcasts.
To test this hypothesis, follow these steps:
- Transform the
reviews_by_month_count_df_after_2015dataframe to show the monthly growth rate for the top 1% and bottom 99% of podcasts.
Growth rate for top 1%:
Mean: 0.08 (Std Dev: 0.44)
Growth rate for bottom 99%:
Mean: 0.02 (Std Dev: 0.14)
To decide the appropriate test, the following should be considered:
- Data should be normally distributed or the sample size should be large.
- Variances of the two groups being compared should be equal.
If these assumptions do not hold, a non-parametric test like the Mann-Whitney U Test should be used.
Shapiro-Wilk Test for Normality:
Test Stat: 0.76
P-value: 0.0 (If p-value < 0.05, data is not normally distributed).
This indicates that a non-parametric test should be used.
Levene Test for Homogeneity of Variances:
Test Stat: 40.53
P-value: 0.0
This also supports the decision to use a non-parametric test.
Mann-Whitney U Test:
U-value: 40.53
P-value: 0.0
The p-value indicates a significant difference in growth rates between the top 1% and bottom 99%. This supports the initial hypothesis.
Comparison of Average Growth Rates:
Average growth for top 1%: 0.08
Average growth for the bottom 99%: 0.02
Distribution of Podcasts by Popularity:¶
Gini coefficient is: 0.93
We can further see that the distribution of reviews between podcasts is extremely unevenly distributed. Specifically, the top 1% of all podcasts by review count have 57% of all reviews.
User Engagement Analysis¶
Distribution of review count by user: mean: 1.34 median: 1.00 stdev: 1.83 skewness: 98.11 *is extremely high and indicates a very strong rightward skewness. This suggests that most of the data values are clustered around the left, with a few extremely large values on the right.
kurtosis**: 21137.37
*direction and degree of asymmetry. A positive skew indicates that the tail is on the right side of the distribution. ** high kurtosis means more of the variance is the result of infrequent extreme deviations.
| author_id | review_count | min_rating | max_rating | avg_rating | top_percentile_review_count | bin | other_reviews | |
|---|---|---|---|---|---|---|---|---|
| 0 | D3307ADEFFA285C | 614 | 5 | 5 | 5.0 | 103 | 5+ | 511 |
| 1 | 96A3F0358E7808D | 472 | 5 | 5 | 5.0 | 61 | 5+ | 411 |
| 2 | 04C3B6DEFE22FB1 | 433 | 5 | 5 | 5.0 | 1 | 5+ | 432 |
| 3 | 8858ED4249110B6 | 394 | 5 | 5 | 5.0 | 2 | 5+ | 392 |
| 4 | 7F9EA8B1A449BFC | 361 | 5 | 5 | 5.0 | 8 | 5+ | 353 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1475280 | 67E028A05FF859F | 1 | 5 | 5 | 5.0 | 0 | 1 | 1 |
| 1475281 | 844C8FBDFC42A79 | 1 | 5 | 5 | 5.0 | 0 | 1 | 1 |
| 1475282 | 2FC8411076C958D | 1 | 5 | 5 | 5.0 | 0 | 1 | 1 |
| 1475283 | CD76019E16F7D8A | 1 | 2 | 2 | 2.0 | 0 | 1 | 1 |
| 1475284 | A28C38545EA5D47 | 1 | 5 | 5 | 5.0 | 1 | 1 | 0 |
1475285 rows × 8 columns
Podcast Genre/Genre Analysis¶
In this section we'll examine whether there is significant variance between review distribution based on podcast category/genere tag.
Total Unique Podcasts 110024
array(['society', 'comedy', 'business', 'education', 'health', 'sports',
'news', 'tv', 'true-crime', 'religion', 'arts', 'leisure', 'kids',
'christianity', 'music'], dtype=object)
array(['true-crime', 'comedy', 'news', 'society', 'kids', 'education',
'religion', 'sports', 'tv', 'health', 'business', 'music', 'arts',
'christianity', 'leisure'], dtype=object)
Hypothesis II¶
We can see that the proportion of reviews belong to the Top 1% of podcasts varies wildly between genre. Based on this we can check for which categories the proportion of reviews which belong to the top 1% increased the most.
| category | podcast_id | year_month | review_count | top_level_category | |
|---|---|---|---|---|---|
| 0 | business | a00018b54eb342567c94dacfb2a3e504 | 2017-10-31 | 1 | business |
| 1 | christianity | a00043d34e734b09246d17dc5d56f63c | 2019-09-30 | 1 | christianity |
| 2 | religion-spirituality | a00043d34e734b09246d17dc5d56f63c | 2019-09-30 | 1 | religion |
| 3 | religion-spirituality | a0004b1ef445af9dc84dad1e7821b1e3 | 2011-08-31 | 1 | religion |
| 4 | spirituality | a0004b1ef445af9dc84dad1e7821b1e3 | 2011-08-31 | 1 | spirituality |
| ... | ... | ... | ... | ... | ... |
| 1247729 | news | ffff32caeedd6254573ad1cc49852595 | 2018-02-28 | 1 | news |
| 1247745 | arts | ffff5db4b5db2d860c49749e5de8a36d | 2011-05-31 | 1 | arts |
| 1247759 | comedy | ffff66f98c1adfc8d0d6c41bb8facfd0 | 2018-09-30 | 4 | comedy |
| 1247761 | education | ffff923482740bc21a0fe184865ec2e2 | 2018-04-30 | 1 | education |
| 1247763 | comedy | ffffbd44ec5f79d502f16ae372bf2d4f | 2021-08-31 | 1 | comedy |
151349 rows × 5 columns
Index(['category', 'podcast_id', 'year_month', 'review_count',
'top_level_category'],
dtype='object')
| top_level_category | post_cutoff | is_top_1_percent | review_count | total | prop_of_all_reviews | |
|---|---|---|---|---|---|---|
| 1 | arts | False | True | 3574 | 16557 | 0.215860 |
| 3 | arts | True | True | 2285 | 9112 | 0.250768 |
| 5 | buddhism | False | True | 47 | 184 | 0.255435 |
| 8 | business | False | True | 9377 | 33327 | 0.281363 |
| 10 | business | True | True | 4451 | 19714 | 0.225779 |
| 12 | christianity | False | True | 1757 | 10361 | 0.169578 |
| 14 | christianity | True | True | 2201 | 7094 | 0.310262 |
| 16 | comedy | False | True | 8684 | 30434 | 0.285339 |
| 18 | comedy | True | True | 5584 | 15611 | 0.357696 |
| 20 | education | False | True | 6855 | 24218 | 0.283054 |
| 22 | education | True | True | 3860 | 19103 | 0.202063 |
| 24 | fiction | False | True | 841 | 2451 | 0.343125 |
| 26 | fiction | True | True | 1084 | 3467 | 0.312662 |
| 28 | government | False | True | 594 | 1936 | 0.306818 |
| 30 | government | True | True | 127 | 857 | 0.148191 |
| 32 | health | False | True | 4417 | 17304 | 0.255259 |
| 34 | health | True | True | 2611 | 14157 | 0.184432 |
| 36 | hinduism | False | True | 12 | 34 | 0.352941 |
| 39 | history | False | True | 1541 | 4142 | 0.372042 |
| 41 | history | True | True | 393 | 2545 | 0.154420 |
| 43 | islam | False | True | 23 | 257 | 0.089494 |
| 45 | islam | True | True | 33 | 124 | 0.266129 |
| 47 | judaism | False | True | 40 | 246 | 0.162602 |
| 49 | judaism | True | True | 28 | 258 | 0.108527 |
| 51 | kids | False | True | 1808 | 7357 | 0.245752 |
| 53 | kids | True | True | 1572 | 5565 | 0.282480 |
| 55 | leisure | False | True | 3168 | 10201 | 0.310558 |
| 57 | leisure | True | True | 1052 | 6776 | 0.155254 |
| 59 | music | False | True | 2106 | 8978 | 0.234573 |
| 61 | music | True | True | 1551 | 4620 | 0.335714 |
| 63 | news | False | True | 3627 | 11611 | 0.312376 |
| 65 | news | True | True | 3560 | 10340 | 0.344294 |
| 67 | religion | False | True | 3255 | 17109 | 0.190251 |
| 69 | religion | True | True | 2993 | 10715 | 0.279328 |
| 71 | science | False | True | 1011 | 3372 | 0.299822 |
| 73 | science | True | True | 294 | 2073 | 0.141823 |
| 75 | society | False | True | 13055 | 41444 | 0.315003 |
| 77 | society | True | True | 6256 | 26072 | 0.239951 |
| 79 | spirituality | False | True | 1083 | 4115 | 0.263183 |
| 81 | spirituality | True | True | 688 | 2726 | 0.252384 |
| 83 | sports | False | True | 5028 | 16407 | 0.306455 |
| 85 | sports | True | True | 2759 | 11894 | 0.231966 |
| 87 | technology | False | True | 1646 | 6719 | 0.244977 |
| 89 | technology | True | True | 445 | 1915 | 0.232376 |
| 91 | true-crime | False | True | 2325 | 5044 | 0.460944 |
| 93 | true-crime | True | True | 911 | 5503 | 0.165546 |
| 95 | tv | False | True | 4485 | 16644 | 0.269466 |
| 97 | tv | True | True | 2837 | 8515 | 0.333177 |
| top_level_category | pre_cutoff_ratio | post_cutoff_ratio | pre_cutoff_review_count | post_cutoff_review_count | relative_change_in_ratio | sum_review_count | |
|---|---|---|---|---|---|---|---|
| 0 | arts | 0.215860 | 0.250768 | 3574.0 | 2285.0 | 0.161715 | 5859.0 |
| 2 | business | 0.281363 | 0.225779 | 9377.0 | 4451.0 | -0.197555 | 13828.0 |
| 3 | christianity | 0.169578 | 0.310262 | 1757.0 | 2201.0 | 0.829611 | 3958.0 |
| 4 | comedy | 0.285339 | 0.357696 | 8684.0 | 5584.0 | 0.253585 | 14268.0 |
| 5 | education | 0.283054 | 0.202063 | 6855.0 | 3860.0 | -0.286134 | 10715.0 |
| 6 | fiction | 0.343125 | 0.312662 | 841.0 | 1084.0 | -0.088781 | 1925.0 |
| 7 | government | 0.306818 | 0.148191 | 594.0 | 127.0 | -0.517006 | 721.0 |
| 8 | health | 0.255259 | 0.184432 | 4417.0 | 2611.0 | -0.277472 | 7028.0 |
| 10 | history | 0.372042 | 0.154420 | 1541.0 | 393.0 | -0.584939 | 1934.0 |
| 11 | islam | 0.089494 | 0.266129 | 23.0 | 33.0 | 1.973703 | 56.0 |
| 12 | judaism | 0.162602 | 0.108527 | 40.0 | 28.0 | -0.332558 | 68.0 |
| 13 | kids | 0.245752 | 0.282480 | 1808.0 | 1572.0 | 0.149449 | 3380.0 |
| 14 | leisure | 0.310558 | 0.155254 | 3168.0 | 1052.0 | -0.500081 | 4220.0 |
| 15 | music | 0.234573 | 0.335714 | 2106.0 | 1551.0 | 0.431169 | 3657.0 |
| 16 | news | 0.312376 | 0.344294 | 3627.0 | 3560.0 | 0.102177 | 7187.0 |
| 17 | religion | 0.190251 | 0.279328 | 3255.0 | 2993.0 | 0.468210 | 6248.0 |
| 18 | science | 0.299822 | 0.141823 | 1011.0 | 294.0 | -0.526975 | 1305.0 |
| 19 | society | 0.315003 | 0.239951 | 13055.0 | 6256.0 | -0.238259 | 19311.0 |
| 20 | spirituality | 0.263183 | 0.252384 | 1083.0 | 688.0 | -0.041032 | 1771.0 |
| 21 | sports | 0.306455 | 0.231966 | 5028.0 | 2759.0 | -0.243067 | 7787.0 |
| 22 | technology | 0.244977 | 0.232376 | 1646.0 | 445.0 | -0.051437 | 2091.0 |
| 23 | true-crime | 0.460944 | 0.165546 | 2325.0 | 911.0 | -0.640854 | 3236.0 |
| 24 | tv | 0.269466 | 0.333177 | 4485.0 | 2837.0 | 0.236431 | 7322.0 |
| top_level_category | year_month_first | prop_top_1_percent_first | year_month_last | prop_top_1_percent_last | prop_change | |
|---|---|---|---|---|---|---|
| 15 | music | 2005-11-30 | 0.0 | 2022-12-31 | 0.891667 | 0.891667 |
| 19 | society | 2005-11-30 | 0.0 | 2022-12-31 | 0.805195 | 0.805195 |
| 8 | health | 2005-11-30 | 0.0 | 2022-12-31 | 0.365854 | 0.365854 |
| 0 | arts | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 13 | kids | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 23 | true-crime | 2015-06-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 22 | technology | 2005-11-30 | 0.0 | 2022-10-31 | 0.000000 | 0.000000 |
| 21 | sports | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 20 | spirituality | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 18 | science | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 17 | religion | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 16 | news | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 14 | leisure | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 12 | judaism | 2005-12-31 | 0.0 | 2022-08-31 | 0.000000 | 0.000000 |
| 1 | buddhism | 2005-11-30 | 0.0 | 2022-05-31 | 0.000000 | 0.000000 |
| 11 | islam | 2005-12-31 | 0.0 | 2022-08-31 | 0.000000 | 0.000000 |
| 10 | history | 2005-11-30 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 9 | hinduism | 2006-11-30 | 0.0 | 2021-09-30 | 0.000000 | 0.000000 |
| 7 | government | 2005-11-30 | 0.0 | 2022-10-31 | 0.000000 | 0.000000 |
| 6 | fiction | 2005-12-31 | 0.0 | 2022-11-30 | 0.000000 | 0.000000 |
| 5 | education | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 4 | comedy | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 3 | christianity | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 2 | business | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
| 24 | tv | 2005-11-30 | 0.0 | 2022-12-31 | 0.000000 | 0.000000 |
'runs'
| column_name | data_type | |
|---|---|---|
| 0 | run_at | text |
| 1 | max_rowid | integer |
| 2 | reviews_added | integer |
'podcasts'
| column_name | data_type | |
|---|---|---|
| 0 | podcast_id | text |
| 1 | itunes_id | integer |
| 2 | slug | text |
| 3 | itunes_url | text |
| 4 | title | text |
'categories'
| column_name | data_type | |
|---|---|---|
| 0 | podcast_id | text |
| 1 | category | text |
'reviews'
| column_name | data_type | |
|---|---|---|
| 0 | author_id | text |
| 1 | podcast_id | text |
| 2 | created_at | text |
| 3 | title | text |
| 4 | content | text |
| 5 | rating | integer |
| 6 | created_at_dt | timestamp with time zone |